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Abstract. This paper provides the first general technique for proving 
information lower bounds on two-party unbounded-rounds communica- 
tion problems. We show that the discrepancy lower bound, which applies 
to randomized communication complexity, also applies to information 
complexity. More precisely, if the discrepancy of a two-party function / 
with respect to a distribution fi is DisCf_if , then any two party random- 
ized protocol computing / must reveal at least i?(log(l/_Disc^/)) bits 
of information to the participants. As a corollary, we obtain that any 
two-party protocol for computing a random function on {0, 1}" x {0, 1}" 
must reveal n{n) bits of information to the participants. 
In addition, we prove that the discrepancy of the Greater- Than function 
is Q{l/^/n), which provides an alternative proof to the recent proof of 
Viola [Violl] of the /2(logn) lower bound on the communication com- 
plexity of this well-studied function and, combined with our main result, 
proves the tight J7(logn) lower bound on its information complexity. 
The proof of our main result develops a new simulation procedure that 
may be of an indep endent in terest. In a very recent breakthrough work 
of Kerenidis et al. [KLL^12] . this simulation procedure was a building 
block towards a proof that almost all known lower bound techniques for 
communication complexity (and not just discrepancy) apply to informa- 
tion complexity. 



1 Introduction 

The main objective of this paper is to expand the available techniques for 
proving information complexity lower bomids for communication problems. Let 
f : X X y {0,1} he a, function, and /i be a distribution on X x y. Informally, 
the information complexity of / is the least amount of information that Alice 
and Bob need to exchange on average to compute f{x,y) using a randomized 
communication protocol if initially x is given to Alice, y is given to Bob, and 
(a;, y) ~ /i. Note that information here is measured in the Shannon sense, and the 
amount of information may be much smaller than the number of bits exchanged. 
Thus the randomized communication complexity of / is an upper bound on its 
information complexity, but may not be a lower bound. 

Within the context of communication complexity, information complexity 
has first been introduced in the context of direct sum theorems for randomized 
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communication complexity |CSWY01IBYJKS04IBBCR10| . These techniques are 
also being used in the related direction of direct product theorems 
|KSDW04ILSS08IJailOIKlalO| . A direct sum theorem in a computational model 
states that the amount of resources needed to perform k independent tasks 
is roughly k times the amount of resources c needed for computing a single 
task. A direct product theorem, which is a stronger statement, asserts that any 
attempt to solve k independent tasks using o{kc) resources would result in an 
exponentially small success probability 2~^^''\ 

The direct sum line of work [H JMR07IJSR08IBBCR10IBR1 1] has eventually 
led to a tight connection (equality) between amortized communication com- 
plexity and information complexity. Thus proving lower bounds on the commu- 
nication complexity of k copies of / for a growing k is equivalent to proving 
lower bounds on the information complexity of /. In particular if / satisfies 
IC{f) = f2(CC{f)), i.e. that its information cost is asymptotically equal to its 
communication complexity, then a strong direct sum theorem holds for /. In 
addition to the intrinsic interest of understanding the amount of information 
exchange that needs to be involved in computing /, direct sum theorems moti- 
vate the development of techniques for proving lower bounds on the information 
complexity of functions. 

Another important motivation for seeking lower bounds on the information 
complexity of functions stems from understanding the limits of security in two- 
party computation. In a celebrated result Ben-Or ct al. [BOGW88] (see also 
[ALllp showed how a multi-party computation (with three or more parties) 
may be carried out in a way that reveals no information to the participants 
except for the computation's output. The protocol relies heavily on the use of 
random bits that are shared between some, but not all, parties. Such a resource 
can clearly not exist in the two-party setting. While it can be shown that perfect 
information security is unattainable by two-party protocols (CK89 BYCK093j , 
quantitatively it is not clear just how much information the parties must "leak" 
to each other to compute /. The quantitative answer depends on the model in 
which the leakage occurs, and whether quantum computation is allowed [Kla04] . 
Information complexity answers this question in the strongest possible sense for 
classical protocols: the parties are allowed to use private randomness to help 
them "hide" their information, and the information revealed is measured on 
average. Thus an information complexity lower bound of / on a problem implies 
that the average (as opposed to worst-case) amount of information revealed to 
the parties is at least I. 

As mentioned above, the information complexity is always upper bounded 
by the communication complexity of /. The converse is not known to be true. 
Moreover, lower bound techniques for communication complexity do not read- 
ily translate into lower bound techniques for information complexity. The key 
difference is that a low-information protocol is not limited in the amount of 
communication it uses, and thus rectangle-based communication bounds do not 
readily convert into information lower bounds. No general technique has been 
known to yield sharp information complexity lower bounds. A linear lower bound 
on the communication complexity of the disjointness function has been shown in 



|Raz92j ■ An information-theoretic proof of this lower bound |BYJKS04] can be 
adapted to prove a hnear information lower bound on disjointness [Brail) . One 
general technique for obtaining (weak) information complexity lower bounds was 
introduced in [Brail) , where it has been shown that any function that has / bits 
of information complexity, has communication complexity bounded by 2'-"^^\ 
This immediately implies that the information complexity of a function / is at 
least the log of its communication complexity (/C(/) > i7(log(CC(/)))). In fact, 
this result easily follows from the stronger result we prove below (Theorem 

1.1 Our results 

In this paper we prove that the discrepancy method - a general communication 
complexity lower bound technique - generalizes to information complexity. The 
discrepancy of / with respect to a distribution fj, on inputs, denoted DisCfj,{f), 
measures how "unbalanced" / can get on any rectangle, where the balancedness 
is measured with respect to fi: 

Pr[/(x,2/) = 0A(x,2/) G i?]-Pr[/(x, y) = lA(a;,y) G R] . 
fj. fi 

A well-known lower bound (see e.g [KN97| ) asserts that the distributional com- 
munication complexity of /, denoted i?^/2-e(/)' "when required to predict / with 
advantage e over a random guess (with respect to /i), is bounded from below by 

r2{\ogi/Disc^if)y. 

^i'/2-.(/)>log(2e/i^*sc^(/)). 

Note that the lower bound holds even if we are merely trying to get an advantage 
of e = yjDisc^J^ over random guessing in computing /. We prove that the 
information complexity of computing / with probability 9/10 with respect to ji 
is also bounded from below by Q{\og{l/ Disc^{f ))). 

Theorem 1. Let f : X x y ^ {0, 1} be a Boolean function and let fi be any 
probability distribution on X x y . Then 

\C^{f ,1/10) > nilogil/ Disc^if))). 

Remark 1. The choice of 9/10 is somewhat arbitrary. For randomized worst-case 
protocols, we may replace the success probability with 1/2 -I- (5 for a constant S, 
since repeating the protocol constantly many times (1/^^) would yield the afore- 
mentioned success rate, while the information cost of the repeated protocol differs 
only by a constant factor from the original one. In particular, using prior-free 
information cost [Brail] this implies IC (/, 1/2 — S) > Q{5'^ \og{l / Disc^{f))). 

In particular. Theorem [1] implies a linear lower bound on the information 
complexity of the inner product function IP{x, y) = ^iVi mod 2, and on a 

random boolean function fr : {0,l}"x{0,l}"— {0,1}, expanding the (limited) 
list of functions for which nontrivial information-complexity lower bounds are 
known: 



Disc^{f) — max 

rectangles R 



Corollary 1. The information complexity \Cuniform{IP,^l^^) of IP{x,y) is 
f2[n). The information complexity \Cuniformifn 1/10) '^f random function 
is fi{n), except with probability 2~^("). 

We study the communication and information complexity of the Greater- 
Than function {GTn) on numbers of length n. This is a very well-studied problem 
|Smi88IMNSW95IKN97| . Only very recently the tight lo wer bou nd of i7(logn) 
in the public-coins probabilistic model was given by Viola |Viollj . We show that 
the discrepancy of the GT„ function is Q{l/^Jri): 

Lemma 1. There exist a distribution jin on X xy such that the discrepancy of 
GTn with respect to jjLn satisfies 

20 

DisCa,XGTn) < —;=■ 
\Jn 

We defer the proof to the appendix. Lemma [T] provides an alternative (ar- 
guably simpler) proof of Viola's [Violl] lower bound on the communication com- 
plexity of GTn- By Theorem [1] Lemma [1] immediately implies a lower bound on 
the information complexity of GT„: 

Corollary 2. IC^„(Gr„, 1/10) = l2(logn) 

This settles the information complexity of the GT function, since this prob- 
lem can be solved by a randomized protocol with O(logri) communication (see 
|KN97) ). This lower bound is particularly interesting since it demonstrates the 
first tight information-complexity lower bound for a natural function that is not 
linear. 

The key technical idea in the proof of Theorem [T] is a new simulation pro- 
cedure that allows us to convert any protocol that has information cost / into 
a (two-round) protocol that has communication complexity 0(T) and succeeds 
with probability > l/2-|-2~'^'^^\ yielding a 2"'-'*^^^ advantage over random guess- 
ing. Combined with the discrepancy lower bound for communication complexity, 
this proves Theorem [TJ 



1.2 Comparison and connections to prior results 

The most relevant prior work is an article by Lee, Shraibman, and Spalek 
|LSS08j . Improving on an earlier work of Shaltiel |Sha03] . Lee et al. show a 
direct product theorem for discrepancy, proving that the discrepancy of f®^ 
— the fc-wise XOR of a function / with itself — behaves as Disc(^f)^^^'> . This 
implies in particular that the communication complexity of f®^ scales at least 
as flik ■ log Disc{f)). Using the fact that the limit as fc — ?> oo of the amortized 
communication complexity of / is equal to the information cost of / |BR10] , the 
result of Lee et al. "almost" implies the bound of Theorem[T] Unfortunately, the 
amortized communication complexity in the sense of |BR10] is the amortized 
cost of k copies of /, where each copy is allowed to err with some probability 
(say 1/10). Generally speaking, this task is much easier than computing the 



XOR (which requires all copies to be evaluated correctly with high probability) . 
Thus the lower bound that follows from Lee et al. applies to a more difhcult 
problem, and does not imply the information complexity lower bound. 

Another generic approach one may try to take is to use compression results 
such as jBBCRlO] to lower bound the information cost from communication com- 
plexity lower bounds. The logic of such a proof would go as follows: "Suppose 
there was a information-complexity-/ protocol tt for /, then if one can compress 
it into a low-communication protocol one may get a contradiction to the com- 
munication complexity lower bound /" . Unfortunately, all known compression 
results compress tt into a protocol tt' whose communication complexity depends 
on / but also on CC{'k). Even for external information complexity (which is 
always greater than the internal information complexity, the bound obtained in 
pBBCRlO is of the form lextiT^) ■ polylog{CC{TT)) . Thus compression results of 
this type cannot rule out protocols that have low information complexity but a 
very high (e.g. exponential) communication complexity. 

Our result can be viewed as a weak compression result for protocols, where a 
protocol for computing / that conveys / bits of information is converted into a 
protocol that uses 0{I) bits of communication and giving an advantage of 2~^^^^ 
in computing /. This strengthens the result in [Braljj where a compression to 
2C'(^) ^ji^s of communication has been shown. We still do not know whether 
compression to a protocol that uses 0{I) bits of communication and succeeds 
with high probability (as opposed to getting a small advantage over random) is 
possible. 

In a very recent breakthrough work of Kerenidis, Laplante, Lerays, Roland, 
and Xiao |KLL"'"12 , our main protocol played an important role in the proof that 
almost all known lower bound techniques for communication comple xity (and 
not just discrepancy) apply to information complexity. The results of [KLL+12] 
shed more light on the information complexity of many communication problems, 
and the question of whether interactive communication can be compressed. 



2 Preliminaries 



In an effort to make this paper as self-contained as possible, we provide some 
background on information theory and communication complexity, which is es- 
sential to proving our results. For further details and a more thorough treatment 
of these subjects see [BRIO) and references therein. 

Notation. We reserve capital letters for random variables and distributions, cal- 
ligraphic letters for sets, and small letters for elements of sets. Throughout this 
paper, we often use the notation \b to denote conditioning on the event B = b. 
Thus A\b is shorthand for A\B = b. 

We use the standard notion of statistical / total variation distance between two 
distributions. 



Definition 1. Let D and F be two random variables taking values in a set S. 
Their statistical distance is \D - F\ maxrcsd Pr[-D £ T] - Pr[F £ 7]|) = 

2.1 Information Theory 

def 

Definition 2 (Entropy). The entropy of a random variable X is H{X) = 
Pr[X = x] log(l/ Pr[X = a;]). The conditional entropy H{X\Y) is defined as 
Eye,Y [H{X\Y = y)]. 

Definition 3 (Mutual Information). The mutual information between two 
random variables A,B, denoted I{A;B) is defined to be the quantity H{A) — 
H{A\B) = H{B) ~ H{B\A). The conditional mutual information I{A;B\C) is 
H{A\C)- H{A\BC). 

We also use the notion of divergence (also known as Kullback-Leibler distance 
or relative entropy), which is a different way to measure the distance between 
two distributions: 

Definition 4 (Divergence). The informational divergence between two distri- 
butions is 

D(A||B)':l=^^A(x)log(A(x)/i?(x)). 

X 

Proposition 1. Let A,B,C be random variables in the same probability space. 
For every a in the support of A and c in the support of C , let Ba denote B\A = a 
andBac denote B\A = a, C = c. T/ien /(A; B|C) = E„,ee„A,c[D (Sac||B,)]. 

2.2 Communication Complexity 

We use the standard definitions of the computational model defined in [Yao79j . 
For complete details see section A of the appendix. 

Given a communication protocol tt, 7r(x, y) denotes the concatenation of the 
public randomness with all the messages that are sent during the execution of 
TT. We call this the transcript of the protocol. When referring to the random 
variable denoting the transcript, rather than a specific transcript, we will use 
the notation n{x, y) — or simply 7T when x and y are clear from the context, 
thus 7r(a:, y) €r T[{x, y). When x and y are random variables themselves, we will 
denote the transcript by 11 {X, Y), or just LI. 

Definition 5 (Communication Complexity notation). For a function f : 
X X y Zx, a distribution ii supported on X x y , and a parameter e > 
0, D^(f) denotes the communication complexity of the cheapest deterministic 
protocol computing f on inputs sampled according to fi with error e. 

Definition 6 (Combinatorial Rectangle). A Rectangle in A" x 3^ is a subset 

R C X X y which satisfies 



{xi,yi) e R and {x2,y2) e R =^ {xi,y2) e R 



2.3 Information + Communication: The information cost of a 
protocol 



The fohowing quantity, which is imphcit in jB YJKS04] and was exphcitly defined 
in jBBCRlO] . is the central notion of this paper. 

Definition 7. The (internal) information cost of a protocol tt over inputs drawn 
from a distribution ^ on X x y, is given by: 

IC^W := I{n;X\Y)+I{n;Y\X). 

Intuitively, Definition [7] captures how much the two parties learn about each 
other's inputs from the execution transcript of the protocol tt. The first term 
captures what the second player learns about X from U ~ the mutual information 
between the input X and the transcript U given the input Y. Similarly, the 
second term captures what the first player learns about Y from 77. 

Note that the information of a protocol tt depends on the prior distribution 
/Li, as the mutual information between the transcript 77 and the inputs depends 
on the prior distribution on the inputs. To give an extreme example, if /i is a 
singleton distribution, i.e. one with fJ,{{{x, y)}) = 1 for some {x, y) G X xy, then 
IC^(7r) = for all possible tt, as no protocol can reveal anything to the players 
about the inputs that they do not already know a-priori. Similarly, IC^(7r) — 
a X = y and ^ is supported on the diagonal {{x,x) : x G X}. As expected, 

one can show that the communication cost CC(7r) of tt is an upper bound on its 
information cost over any distribution fi: 

Lemma 2. WRlOf For any distribution ^, IC^(7r) < CC(7r). 

On the other hand, as noted in the introduction, the converse need not hold. In 
fact, by [BRIO) , getting a strict inequality in Lemma [2] is equivalent to violating 
the direct sum conjecture for randomized communication complexity. 

As one might expect, the information cost of a function / with respect to ^ 
and error p is the least amount of information that needs to be revealed by a 
protocol computing / with error < p: 

IC^(/,p) inf IC^(7r). 

tt: P^[TT{x,y)^f{x,y)]<p 

The (prior-free) information cost was defined in [Brail] as the minimum amount 
of information that a worst-case error-p randomized protocol can reveal against 
all possible prior distributions. 

\C{f,p):— inf max IC^(7r). 

7r is a protocol with P[7r(x, y) 7^ /(a:, y)] < p for all {x, y) M 

This information cost matches the amortized randomized communication com- 
plexity of / [Brail] . It is clear that lower bounds on IC^(/, p) for any distribution 
H also apply to IC (/, p). 



3 Proof of Theorem [T] 



To establish the correctness of Theorem[I] we prove the fohowing theorem, which 
is the central result of this paper: 

Theorem 2. Suppose that IC^(/, 1/10) = 7^. Then there exist a protocol tt' 
such that 

- CC(V) -0(/^). 

- P(,,,)^^[V(x, y) = fix., y)] > 1/2 + i-^H.) 

We first show how Theorem [T] follows from Theorem [5] 

Proof of Theorem [TJ Let /, /i be as in theorem[Tl and let ICp(/, 1/10) — I^. 
By theorem [2l there exists a protocol tt' computing / with error probability 
1/2 — 2~'~'^^'^^ using 0{lfj) bits of communication. Applying the discrepancy 
lower bound for communication complexity we obtain 

0(/m) > ^f/2_2-o(.,.,(/) > log(2 • 2-o(^-)/i?isc,,(/)) (1) 
which after rearranging gives > r2(log(l/Z?isc^i(/))), as desired. 

We now turn to prove Theorem [2j The main step is the following sampling 
lemma. 

Lemma 3. Let /i he any distribution over a universeU and let I > be a param- 
eter that is known to both A and B. Further, let va o,nd vb be two distributions 
overU such that D (/^||i/^) < / and D (/i||z^B) < /. The players are each given a 
pair of real functions {pA, Qa), (pb, <Ib), Pa, gA,PB, 9s : W — 5* [0, 1] such that for 
allx e U, fi{x) = pa{x)-pb{x), va{x) = PA{x)-qA{x), andvuix) = pB{x)-qB{x). 
Then there is a (two round) sampling protocol 11 1 = ni(pA,PB,QA,QB, I) which 
has the following properties: 

1. at the end of the protocol, the players either declare that the protocol "fails" , 
or output xa € ^ flJ^c? xb S U , respectively ("success"). 

2. let S be the event that the players output "success". Then S xa = xb, 
and 

0.9 • 2-5"(-f+i^ < Pr[5] < 2-50(/+i). 

3. if fJ-i is the distribution of xa conditioned on S, then \fi — fii\ < 2/9. 

Furthermore, 77i can be "compressed" to a protocol 772 such that CC(i72) ~ 
211/ + 1, whereas |7Ti — i72| < 2^^^^ (by an abuse of notation, here we identify 
Hi with the random variable representing its output). 

We will use the following technical fact about the information divergence of 
distributions. 



Claim (3). [Claim 5.1 in |Brall) ] Suppose that D(/i||i^) < /. Let e be any 
parameter. Then 

H |x : 2(^+^'/- • v{x) < < e. 

For completeness, we repeat the proof in the appendix.. 

Proof (T*roof of Lemma [3]). Throughout the execution of TTi, Alice and Bob 
interpret their shared random tape as a source of points {xi,ai,l3i) uniformly 
distributed in U x [0, 25°(^+i)] x [0,2^'^^^+^'']. Ahce and Bob consider the first 
T =\l(\- 2i°''(^+i) • 60/ such points. Their goal wih be to discover the first index 
T such that ar < pa{xt) and /3t < pb(xt) (where they wish to find it using a 
minimal amount of communication, even if they are most likely to fail). First, we 
note that the probability that an index t satisfies at < pA{xt) and /3t < psixt) is 
exactly l/\U\2^°(^^+^h^°^^+^'> = l/\U\2^°°^^+^\ Hence the probability that r > T 
(i.e. that Xt is not among the T points considered) is bounded by 

(l - l/|Z^|2i""(^+i)) ^ < e-^/l"|2™''+'' = < 2-60^ (2) 

Denote by A the following set of indices A := {i < T : ai < pA{xi) and f3i < 
250(7+1) . q^(^Xi)}, the set of potential candidates for r from A's viewpoint. Sim- 
ilarly, denote B :^{i<T : a, < 250(^+i) • qeix,) and A < PB{xr)}. 

The protocol Ui is very simple. Alice takes her bet on the first element a € A 
and sends it to Bob. Bob outputs a only if (it just so happens that) I3r <PB(fl)- 
The details are given in Figure [5] in the appendix. 



We turn to analyze 77i . Denote the set of "Good" elements by 

g =^ {x : 25"(-f+i) . va{x) > ti{x) and 2^°'-^+^'^ ■ i^b{x) > Kx)}}- 

Then by Claim O fi{g) > 48/50 = 24/25. The following claim asserts that if it 
succeeds, the output of Ui has the "correct" distribution on elements in Q. 

Proposition 2. Assume A is nonempty. Then for any Xi G U, the probability 
that Ui outputs Xi is at most ii{xi) ■ 2^^'^(^+i). If Xi e Q, then this probability is 
exactly n{x,) ■ 2-50(^+i). 

Proof. Note that if A is nonempty, then for any Xi d U, the probability that Xi 
is the first element in A (i.e, a = Xi) is pA{xi)qA{xi) = VA(xi). By construction, 
the probability that Pi < ps(a) is min{pB(a;i)/(2^*'(^+^^q^(a;i)), 1}, and thus 

Pr[i7i outputs X.] < PA{x^)qA{x^) ■ ^,oU+ir\ A = ^(^^) ' 2-^°^'+^^ 

250(J+i)g^(a;j) 

On the other hand, ifx^ G t/, then we know that pB(a;i)/qA(a;i) = iJL(xi)/vA{xi) < 
250(^+1), and so the probability that < psia) is exactly pB^Xi)/ {2^°^^ +'^\aIx^)). 
Since Pr[7Ti outputs Xi] — Pr[a — Xi] Pr[/3i < pB{a)] (assuming A is nonempty), 
we conclude that: 

x^eg=^ Pr[77i outputs x,] = pA{x,)qA{x,)- -^^0p^— = p{x,)-2-^°'-'+^\ 



We are now ready to estimate the success probability of the protocoL 

Proposition 3. Let S denote the event that A ^ and a G B (i.e, that the 
protocol succeeds). Then 

0.9 • 2-5"(^+i) < Pr[5] < 2-50^+1). 

Proof. Using Proposition [51 we have that 

Pr[5] <V[aeB\A^^=Y^ Pr[a = a;,] Pr[/3, < pB{a)] < (3) 

< ^/.(x,)- 2-50(^+1) =2-5"(^+i) 
ieu 

For the lower bound, we have 



Pr[5] > Pr[/3, <PB{a)\A^9]- Pt[A ^ 0] > 

> (1 - 2-60^) fj2Pr[a = X,] Pr[A < Pb (a)] ) > 



^ ieu 

> (1 - 2-60/) f J2 Pr[a = X,] Pr[A < Psia)] 

^ ieg 

= (1 - 2-*^°^) f2-^"('+^) ^,{x^)\ = (1 - 2-60/) f 2-^0(1+1) ^^g^ > 

> ^(1 _ 2-60/)2-5oa+i) > 0.9 • 2-^°(^+i) (4) 

where the equality follows again from claim [51 This proves the second claim of 
Lemma [3l 

The following claim asserts that if S occurs, then the distribution of a is 
indeed close to /i. 

Claim 4. Let /ii be the distribution of a\S. Then — /i| < 2/9. 

Proof. The claim follows directly from proposition [31 We defer the proof to the 

appendix. 

We turn to the "Furthermore" part of of Lemma [3l The protocol i7i satisfies 
the premises of the lemma, except it has a high communication cost. This is due 
to the fact that Alice explicitly sends a to Bob. To reduce the communication, 
Alice will instead send 0{I) random hash values of a, and Bob will add corre- 
sponding consistency constraints to his set of candidates. The final protocol 7T2 
is given in Figure [H 

Let £ denote the event that in step [3l of the protocol, Bob finds an ele- 
ment Xi ^ a (that is, the probability that the protocol outputs "success" but 



Information-cost sampling protocol 112 



1. Alice computes the set A. Bob computes the set B. 

2. If ^ = 0, the protocol fails. Otherwise, Alice finds the first element a £ A and 
sets XA = a. She then computes d = [211/] random hash values hi{a), . . . , hd{a), 
where the hash functions are evaluated using public randomness. 

3. Alice sends the values {hj{a)}i<j<d to Bob. 

4. Bob finds the first index r such that there is a,b £ B for which hj (b) = hj (a) for 
j = l..d (if such an r exists). Bob outputs Xb = Xt- If there is no such index, 
the protocol fails. 

5. Bob outputs Xb ("success"). 

6. Alice outputs xa- 



Fig. 1. The sampling protocol 112 from Lemma [3] 

Xa 7^ Xb)- We upper bound the probability of E. Given a £ A and Xi G B 
such that a ^ Xi, the probability (over possible choices of the hash functions) 
that hj{a) = hj{x,) for j = l..d is 2''' < 2-2i". For any t, P[t e B] < 
ir\T...^uPB{xMB{xd ■ ^ ^^Y...^^Mxr) ■ 2^°(^+^) = 2^«(^+i)/|Z^|. 

Thus, by a union bound we have 

P[£] < Ppx, e B s.t x,^a K hj(a) = hj{x,) V j = 1, . . . , d] < 

< T ■ 250(^+1) . 2-^1^^! = 21^0(^+1) • 60/ • 2-211J < 2-^0^. (5) 

By a slight abuse of notation, let II2 be the distribution of /T2's output. 
Similarly, denote by IIi the distribution of the output of protocol Ui . Note that 
if £ does not occur, then the outcome of the execution of 112 is identical to the 
outcome of IIi. Since P[£] < 2^^"^, we have 

I//2 - /Til = Pr[£] ■ \[n2\£] - [//i|£]| < 2 • 2-60/ ^ ^'^91 

which finishes the proof of the lemma. 

Remark 2. The communication cost of the sampling protocol 112 can be reduced 
from 0{I^) to 0(1) (more precisely, to only two bits) in the following way: In- 
stead of having Alice compute the hash values privately and send them to Bob 
in step 2 and 3 of the protocol, the players can use their shared randomness 
to sample d = 0{I^) random hash values hi(bi), . . . , hd{bd) (where the b^'s are 
random independent strings in i/), and Alice will only send Bob a single bit 
indicating whether those hash values match the hashing of her string a (i.e, 
hiipi) = hi{a) for all i S [d]). In step 4 Bob will act as before, albeit comparing 
the hashes of his candidate b to the random hashes hi{bi), and output success 
("1") if the hashes match. Note that this modification incurs an additional loss 
of 2-"^ = 2-0(-f^') in the success probability of the protocol (as this is the prob- 
ability that hi{bi) = hi{a) for all i S [d]), but since the success probability we 
are shooting for is already o f the orde r 2-'^^^''^ we can afford this loss. This 
modification was observed in jKLL+12| . 



With Lemma [3] in hand, we are now ready to prove Theorem [5] 
Proof of Theorem [2l Let tt be a protocol that reahzes the value := 
IC^(/, 1/10). In other words, tt has an error rate of at most 1/10 and infor- 
mation cost of at most with respect to /i. Denote by n^y the random variable 
that represents that transcript tt given the inputs {x,y), and by tt^ (resp. iTy) 
the protocol conditioned on only the input x (resp. y). We denote by ttxy the 
transcripts where {X, Y) are also a pair of random variables. By Claim |31 we 
know that 

= I{X;nxY\Y) + I{Y;nxY\X) = E^,^y)^^[I) {TT^yllTT.,) +1) {n^y\\ny)l (6) 

Let us now run the sampling algorithm ili from Lemma [3l with the distri- 
bution /i taken to be TT^y, the distributions va and vb taken to be and tt^ 
respectively, and / taken to be 20/^. 

At each node v of the protocol tree that is owned by player X let Poiv) and 
Pi{v) = 1 — po{v) denote the probabilities that the next bit sent by X is and 
1, respectively. For nodes owned by player Y, let qo(v) and qi{v) = 1 — qo{v) 
denote the probabilities that the next bit sent by y is and 1, respectively, as 
estimated by player X given the input x. For each leaf £ let px{^) be the product 
of all the values of Pb{v) from the nodes that are owned by X along the path 
from the root to let qx{^) be the product of all the values of qb{v) from the 
nodes that are owned by Y along the path from the root to t. The values pY{tj 
and qY{(-) are defined similarly. For each ^ we have P[7ra,j^ = ^] = Px{i) ■ Py{^), 
P[t^x = £] = Px{(-) ■ qx{^), and P[TTy = i] = py{(-) ■ qY{^)- Thus we can apply 
Lemma [3] so as to obtain the following protocol tt' for computing /: 

— If III fails, we return a random unbiased coin flip. 

— If III succeeds, we return the final bit of the transcript sample T. Denote 
this bit by Tout- 

To prove the correctness of the protocol, let Z denote the event that both 
D (7ra;y||7ra;) < 20/^ and H {iTxyWTTy) < 20/^. By ([6]) and Markov inequality, 
Pr[2] > 19/20 (where the probability is taken with respect to /i). Denote by 5 the 
probability that ili succeeds. By the assertions of LemmaO 6 > 0.9 • 2^^"(^+^). 
Furthermore, if IIi succeeds, then we have \T — n^yl < 2/9, which in particular 
implies that P[To,jf = TTout] > 7/9. Finally, P[7ro„t = f{x,y)] > 9/10, since tt has 
error at most 1/10 with respect to /i. Now, let W denote the indicator variable 
whose value is 1 iff TT'{x,y) = f{x,y). Putting together the above, 

E|W|21.a-.)4 + *^(I-i;)>l + ..i>l + i.2-».'«> (7) 

On the other hand, note that by lemma [3] the probability that ili succeeds is at 
most 2~^^^-^^^'> (no matter how large D (tt^:;^! Itt^;) and D {TTxy\\TTy) are!), and so 
E[>V I -Z] > (l-2-50(/+i))/2. 
Hence we conclude that 



E[W] - E[W I Z] ■ P[Z] + E[W I -Z] • PhZ] > ^ + - • 2-50(^+1) • + 

yz 8 J 20 

+ f 1 - 2"^°^^+^)^ . 1 . J_ > i + J_ . 2-50(/+i) > i I J_ . 2-iooo(/„+i) 
V y 2 20 - 2 12 2 12 

Finally, Lemma |3] asserts that |i7i — 772 1 < 2~^^^. Thus if we replace TTi by 
7T2 in the execution of protocol tt', the success probability decreases by at most 
2-59/ ^ i_ . 2-50(/+i) Puj-thermore, the amount of communication used by tt' 
is now 

211/ = 4220/^ = 0(7^). 
Hence we conclude that with this modification, tt' has the following properties: 

- CC(V) = 4220 • /^; 

- P(.,,).^K(x,y) = f[x,y)] > i/2 + 2-i™o(^"+i)-4; 
which completes the proof. 

Remark 3. Using similar techniques, it was recently shown in [Brallj that any 
function / whose information complexity is / has communication cost at most 
20(^)0, thus implying that IC{f ) > l2(log(CC(/))). We note that this result can 
be easily derived (up to constant factors) from Theorem [21 Indeed, applying the 
"compressed" protocol 2*^*^^^ log(l/e) independent times and taking a majority 
vote guarantees an error of at most e (by a standard Chernoff bouncQ), with 
communication 0{I) ■ 2°(^) = 2°(^). Thus, our result is strictly stronger than 
the former one. 
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A Communication Complexity 

Let X, y denote the set of possible inputs to the two players, who we name A and 
B. We view a private coins protocol for computing a function f : X x y ^ 
as a rooted tree with the following structure: 

— Each non-leaf node is owned by A or by B. 

— Each non-leaf node owned by a particular player has a set of children that 
are owned by the other player. Each of these children is labeled by a binary 
string, in such a way that this coding is prefix free: no child has a label that 
is a prefix of another child. 

— Every node is associated with a function mapping X to distributions on 
children of the node and a function mapping y to distributions on children 
of the node. 

— The leaves of the protocol are labeled by output values. 

A public coin protocol is a distribution on private coins protocols, run by 
first using shared randomness to sample an index r and then running the corre- 
sponding private coin protocol tt^. Every private coin protocol is thus a public 
coin protocol. The protocol is called deterministic if all distributions labeling the 
nodes have support size 1. 

Definition 8. The communication cost (or communication complexity) of a 
public coin protocol tt, denoted CC(7r), is the maximum number of bits that can 
be transmitted in any run of the protocol. 

Definition 9. The number of rounds of a public coin protocol is the maximum 
depth of the protocol tree -k^ over all choices of the public randomness. 



B Proof of Claim 3 (from [ Brail] ) 

Proof Recall that D {^jl\\v) = Y^xeu m(^) log 1^- Denote by = {x : n[x) < 
z^(x)} - the terms that contribute a negative amount to D (/x||i/). First we observe 
that for all < a; < 1, xlogx > —1, and thus 

E log ^ = E -^^oM^J: •(-!)> -1- 



Denote hy C = {x : 2(^+i)/^ • i^{x) < //(x)}; we need to show that fJ'{C) < e. 



For each x G £ we have log > (/ + 1) /e. Thus 



/ > D (^1 li.) > ^ ^l{x) log ^ + 5] ^,{x) log ^ > • (/ + - i, 

implying fi{C) < e. 



C Proof of Claim 4 

Proof. For any a;^ € Z^, 



;.i(a;,) = Pr(a = | 5) < p^"^^]^'^'^ < ^ = (1 + 1/9)^^(0:,) (8) 
where the last inequality follows from Proposition |31 Hence, — /i| = 



This proves claim (3) of the lemma. 

D Proof of Lemma [It The discrepancy of the 
Greater-Than function 

We consider the Greater-Than function on n-bit strings. We start by defining 
the "hard" distribution A pair {x, y) is sampled as follows: 

1. Sample an index fc G {1, . . . , n} uniformly at random. 

2. Sample zi, . . . , Zfe_i, w, Xfe+i, . . . ,x„, Uk+i-, ■ • ■ , J/n — uniformly random bits. 

3. Let a; = zi, . . . , w, x^+i , . . . , a;„, y = zi, . . . , Zk-i,w,yk+i, . . • ,?/„. 

Denote this distribution by /i„. Let GTn{x,y) = I iS x > y. We will prove the 
following Lemma: 

Lemma 4. The discrepancy of GT^ with respect to fin satisfies 

20 

DisCn{GTn) < —;=■ 

V"- 

In fact, to facilitate an inductive proof, we will show a slightly stronger 
statement: 



Lemma 5. Let R ^ S x T be a rectangle in {0, 1}" x {0, 1}". Let s := \S\/2 
and t :— |T|/2" be the uniform size of S and T respectively. Then 



Note that Lemma [5] immediately implies Lemma |4l 

Proof. We prove Lemma[5]by induction on n. The statement is trivially true for 
n = 1. Assume the statement is true for n — \. Our goal is to prove it for n. Let 
R = S K T he any rectangle in {0, 1}" x {0, 1}". By a slight abuse of notation 
we write: 

Disc^^ {GTn, R) - Pr[/(a;, y) = I ^ (x, y) e R] - Pr[/(a:, y) = A (x, y) e R], 

and prove an upper bound on this quantity (without | • |). The matching upper 
bound on -'Disc^^[GTn, R) follows by an identical argument. 

Let s := 151/2" and t := |T|/2". Denote by Sq the set of strings in S that 
begin with a 0, and 5*1 := S \ Sq. Similarly, define Tq and Ti. Further, let 
p:= \So\/\S\ andg:= |To|/|T|. 

Note that restricted to 5*0 x To, A*" is the same distribution as fin-i, scaled 
by a factor of Moreover, sq := |5'o|/2"-i = ps2"/2"-i = 2ps. Similarly, 
si := |5i|/2"-i = 2(1 - p)s, to := |To|/2"-i = 2qt, ti |Ti|/2"-i = 2(1 - q)t. 
Putting these pieces together, and applying the inductive hypothesis we get: 

Disc^JGTn, 5 X T) = Disc^JGTn, So x To) + Disc^,^{GT^, Si x Ti)+ 
Disc^^iGTn, Si X To) + Disc^,^{GTn, So x Ti) ^ 

71 ~ 1 TI — " 1 

— Disc^^_^{GTn-i,So X To) + — DisCi^,^_^{GTn-i, Si x Ti) + 



Dtsc^„{GT,„R) < 



20%/si 



— (1 — p)sqt ps{l — q)t < 



n n 




1 




■ (20^ • + 20^(1 -p)(i-g) • %/^) + 



p)st 



) 



If g — p < 0, we continue ^ as follows: 




where the last inequality follows from simple calculations. 



On the other hand, in the more difficult case when q — p > 0, we use the fact 
the si < 1 to continue © as follows: 



RHS < 



^ (V^^- (20^ • V^t + 20^/(1 - P){1 -q)-V^)+^iq- p)V^^ = 



Next, we use the readily verifiable facts that -i / < 1 — -j- and that y/pq 



1/(1 - p){l - q) < 1 - (9 - p)^/4, to continue (dU]) as follows: 



i<i^5<5#f(l-i-)^(l-(,-rtV4) + ^(,-p))< 



20Vii / l/(2n) + (g-p)V4 \ ^ _ ^ 



20 / , /I (g~p)" , 1 . . 



+ 777^(9 -P) <^^' (11) 



/n 



VSy^ 10 



where the third-to-last inequality follows from the fact that for all < a, 6 < 1, 
(1 — a)(l — 6) < 1 — a/2 — 6/2, and the second-to-last one in an application of 
the AM-GM inequality. 



E Sampling protocol from Lemma [5] 



Information-cost sampling protocol 77i 



1. Alice computes the set A. Bob computes the set B. 

2. If ^ = 0, the protocol fails, otherwise Alice finds the first element a £ A, and 
sends a to Bob. 

3. Bob checks if a £ B. If not, the protocol fails. 

4. Alice and Bob output a ("success"). 



Fig. 2. The sampling protocol i7i from Lemma [3] 



